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(57) Abstract: Methods are disclosed whereby variants of proteins which do iK>t confer selectable (dienotypes can nevertheless 
be selected for stable expression in heterologous hosts. Related methods are disclosed whereby cDNA expression libraries can 
be enriched for stable expression of autonomously folding domains in heterologous hosts. Related methods are further disclosed 
whereby peptides which stabilize unstable iHOteins may be selected from random peptide libraries. If a heterologous protein is 
expressed as a fusion with a selectable phenotype, the strength of the phenotype is proportional to the folding rate, and therefore 
die solubility of the protein of interest Thus, the selectable phenotype can be used to select better expressors from libraries of 
mutagenized proteins of interest, or it can be used to select autonomously folding domains ( AFD) from cDNA expression libraries, 
or it can be used to select peptides which stabilize unstable proteins. 
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A GENERAL M ETHOD FOR OPTIMIZING THE EXPRESSION 
OF HETEROLOGOUS PROTEINS 

Technical Field 

This invention is related to methods and compositions for obtaining stable expression of 
a protein of interest, to obtain mutant proteins having enhanced stability as compared to wild 
type proteins, and to stabilize unstable proteins associated with disease by expressing the protein 
of interest as a chimeric protein composed of a selection marker and the protein of interest under 
selective conditions. The invention is exampiified by preparation of highly fluorescent OFF 
mutants by expressing the GPP as a C-ierminal fusion with chloramphenicol acetyltransferase. 

INTRODUCTION 

Background 

Natural proteins have three ftindamenial propenies in vivo which can be exploited to 
obtain stable expression of heterologous proteins in the absence of any distinguishing 
phenotype of the protein itself. 



1. Natural proteins have unique minimum energy conformations. 

In its broadest sense, a fold is simply a minimum energy conformation or ground state. 
The vast majority of sequences in protein sequence space do not have unique folds, but rather 
have multiple, inter-convertible minimum energy conformations (Li et al.. Science (1996) 
273:666-669; Godzik, TIBTECH (1997) 15:147-151; Sauer, Folding and Design (1996) 
l:R27-R30; Govindarajan and Goldstein, Proc, Natl. Acad, Sci USA (1996) 93:3341-3345). 
However, protein function is so exquisitely dependent on specific tertiary structure that it is 
generally not possible for a protein to be functional in more than one fold. For this reason 
evolution has selected sequences which fold cooperatively in the intracellular milieu into 
unique minimum energy conformations with high energy barriers between them and all 
kineucally accessible alternatives (Govindarajan and Goldstein, Proc. NatL Acad, Sci USA 
(1996) 93:3341-3345). A general feature of these folds is a compact hydrophobic core. If, 
under adverse conditions of temperamre, pH, ionic strength, etc., such as might occur during 
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physical or metabolic stress, the hydrophobic cores are disrupted or fail to form properly or in 
a timely fashion inside the celK the exposed hydrophobic surfaces have a strong tendency to 
initiate intermolecular aggregation, which is generally toxic to cells. In fact, the aggregation of 
misfolded proteins has become increasingly recognized as a conmion eiiologic component of 
5 disease (Wetzel, Cell (1996) 86:699-702). For this and other regulatory reasons cells have 
evolved highly efficient proteolytic mechanisms to detect exposed hydrophobic structure and 
prevent misfolded proteins from accumulating (Weissman et aL, Science (1995) 268:523-524; 
Coux et aL, Ann. Rev. Biochem. (1996) 65:801; Tamura et aL, Science (1996) 274:1385- 
1389; Lowe et aL, Science (1996) 268:533-539). 

i(> Nascent proteins partition among three fates in vivo: (I) folding into the soluble, 

functional, native ground state, (2) amorphous aggregation into insoluble inclusions, (3) 
proteolysis. Fate (I) is generally proportional to the folding rate. The faster a protein folds 
the more of it will reach the native ground state before succumbing to fates (2) and (3), which 
are essentially irreversible. With respect to stability, the most important effect of folding is to 

15 sequester hydrophobic surface, since exposed hydrophobic surface is the driving force for both 
aggregation and proteolysis. From these properties of natural proteins in vivo, it may be 
inferred that most natural proteins can only be stable in vivo in their native conformations in 
which they are also likely to be functional. Thus, most mutations which stabilize a natural 
protein in a heterologous host, will generally favor the native conformation, and will restore at 

20 least some measure of native functionality. 

2. A multi-domain protein is only as stable in vivo as its least stable domain. 
Inside the cell, nascent proteins first encounter the hsp40 and hsp70 classes of 

chaperone proteins and their associates which bind to any exposed hydrophobic sequence to 
25 protect the nascent protein in its unfolded state (Hartl, Nature (1996) 381 :57 1-580). The new 

protein is then released from the chaperones by a cooperative, energy-dependant mechanism. 

Many proteins then fold with two-state kinetics, collapsing rapidly into tiieir native folds 

without discemable intermediates. The remainder, however, may accumulate as * molten 

globule' intermediates while searching for conformation space for their native folds. Since in 
30 this state they are vulnerable to aggregation or proteolysis, the hsp60 chaperonin system has 

evolved to provide a protective environment for slower folders (Hartl, 1996). Misfolded 



wo 01/29225 



PCT/US00/084T; 



proteins are drawn into the multimeric Hsp60 cylindrical complexes (illustrated in Figure I), 
where they are bound to the inner surface in a fully extended state (Zahn et aL, Nature (1994) 
368:261-265; Buckle etaL, Proc, Natl. Acad. ScL USA (1997) 94:3571-3575). Each protein 
then undergoes cooperative, energy-dependant release into the cavity of the complex where it 
can attempt to fold without risk of aggregation or proteolysis. Thus, the folding machinery 
acts not by catalyzing or accelerating folding, but by protecting nascent proteins from 
alternative fates. 

Any protein, new or old, which undergoes sufficient transient thermal or chemical 
denaturation to expose hydrophobic surface may be bound and unfolded by the chaperonin 
complex. Each protein may then undergo multiple rounds of binding, unfolding, release, and 
refolding until its native fold is achieved. However, after each round in which a protein still 
fails to achieve its native fold, it may either be rebound by the folding complex, or it may be 
bound by the protein mmover machinery, which also recognizes exposed hydrophobic 
surfaces. These alternative fates for nascent proteins are illustrated in Figure I. Thus, the 
longer it takes a protein to fold, the more vulnerable it is to proteolysis or aggregation. 
Proteins which incur significant delays in folding in heterologous hosts may fail to accumulate 
for this reason. 

There are many apparent parallels between mechanisms of protein folding and turnover 
in ceils (Weissman et aL, Science (1995) 268:523-524). In both cases exposed hydrophobic 
surfaces are recognized and bound cooperatively, leading to unfolding of the entire protein 
within multi-subunit complexes having similar cylindrical architectures (Weissman et al.. 
Science (1995) 268:523-524; Zahn et al.. Nature (1994) 368:261-265; Buckle et aL. Proc, 
NatL Acad. Sci. USA (1997) 94:3571-3575). However, whereas in the folding process the 
bound protein is released to fold again, in the proteosome, the bound protein is proteolyzed 
(see Fig. 1). Thus, the ability of proteins to accumulate in cells depends on both the rate of 
folding and the stability of the final fold, though recent work has suggested that these two may 
in fact be related (Scalley and Baker, Proc. Natl, Acad. ScL USA (1997) 94:10636-40). From 
the foregoing it may be surmised that if any single domain in a multi-domain protein is 
unstable, the entire protein may be subject to proteosomal proteolysis. This is logical given 
that any surviving stable fragments from proteolysis of natural proteins would likely be either 
useless or deleterious, especially if they retained unregulated activity. Indeed, stable 



wo 01729225 



PCT/US00/084r7 



4 

fragments of natural proteins are rarely detected in cells unless they have functional 
significance. From this it follows that the su-ength of a selectable cellular phenotype should be 
proportional to the "foldability" of any domain to which the phenotype-conferring domain is 
fused, and that this proportionality could have many useful applications in protein engineering. 

5 

3. Mutations which stabilize proteins in vivo promote the native fold and function. 

As discussed above, the vast majority of sequences in protein sequence space are not 
'•foldable", i.e., they do not specify unique minimum energy conformations. The tiny fraction 
that do specify unique stable folds have been selected by evolution to serve as scaffolds for 

10 protein function. Interestingly, computational experiments have suggested that the most stable 
folds are also the most "designable" (Li et aL, Science (1996) 273:666-669). That is, they 
may be specified by the largest number of different sequences. This makes them 
evolutionarily stable as well as thermodynamically stable. Among natural proteins a number 
of **superfolds'' have been observed, such as the immunoglobulin fold, a - 12 kDa sandwich 

15 of two 3-5-strand p-sheets, which has been adapted repeatedly to many different functions, 
specified by many different sequences (Padlan, Molecular Immunology (1994) 31:169-217). 
An increasing number of other studies have shown that extensive sequence alterations may be 
made in the cores of model proteins without substantially altering the native fold or function 
(Axe ei aL. Proc. Natl, Acad. ScL USA (1996) 93:5590-5594; Sauer, Folding and Design 

20 (1996) 1:R27-R30). 

Many of the known protein folds have been observed in both prokaryotic and 
eukaryotic proteins (Netzer and Hartl, Nature (1997) 388:343-349). This is consistent with the 
fact that the intracellular milieus of prokaryotic and eukaryotic cells are quite similar with 
respect to bulk propenies such as pH, ionic strength, protein concentration, etc. In spite of 

25 this, many natural proteins are unstable in heterologous hosts in that they eidier fail to 

accumulate to detectable levels, or when over-expressed they accumulate only as insoluble 
aggregates. Protein synthesis is ten times faster in prokaryotes than in eukaryotes (15 sec vs 2- 
3 min for a 40 kDa protein; Netzer and Hartl, Nature (1997) 388:343-349). This parallels cell 
division rates and allows each domain of a multi-domain protein to fold as it's made in 

.^0 eukaryotes. free from interference by simultaneously folding downsu-eam domains. This 
adaptation acconmiodates the rise of multi-domain proteins in eukaryotes, which facilitate 
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compamnentalizaiion of complex metabolic networks. In prokaryoies, protein synthesis is so 
fast that multiple domains are synthesized before they have time to fold, and may therefore 
interfere with the folding of each other. For this reason, prokaryotes have fewer multi-domain 
proteins, and many examples exist of prokaryotic single-domain proteins the eukaryotic 
homologs of which are linked into continuous polypeptides. Thus, inter-domain interference 
during folding may be a major factor contributing to the high failure rate for heterologous 
expression of eukaryotic multi-domain proteins in prokaryotes. 

If most natural proteins have highly "designable" folds, then for each such protein 
there should be many sequences capable of specifying efficient, stable folding of the ftinctional 
protein in heterologous environments. Natural proteins cannot be expected to fold any more 
efficiently than necessary in their natural milieus. In general, eukaryotic proteins may fold 
more slowly than prokaryotic proteins because the risk of aggregation is much greater in the 
prokaryotic cytoplasm. Because of the ten-fold higher prokaryotic protein synthesis rate, local 
concentrations of nascent proteins are much higher and nascent proteins have little chance to 
fold while still tethered to the ribosome. Folding is essentially a uni-molecular reaction and 
therefore independent of concentration, whereas, aggregation is effectively bi-molecular, and 
is therefore strongly favored by high concentrations of the protein in question. Since the 
insoluble fraction does not turn over, it grows monotonically, providing an ever increasing 
substrate for aggregation. As a result, the aggregation rate may rise exponentially, rapidly 
reaching a point where little nascent protein escapes. Thus, aggregation is a threshold-like 
phenomenon which 

is exquisitely sensitive to small changes in parameters such as the folding rate and synthesis 
rate, which affect the initial aggregation kinetics. 

The sampling of conformation space to find associations which nucleate cooperative 
assembly of the final fold is generally the rate-limiting step in folding, and the rate of this 
process is sharply limited by the energies of off-pathway interactions. Mutations which cause 
even modest destabiiization of off-pathway interactions may accelerate folding sufficiently to 
increase soluble protein by several orders of magnimde. Thus, single mutations have been 
observed to accelerate folding up to 1000-fold, with double and triple mutants having even 
larger effects (Jackson, Folding and Design (1998) 3:R81-R9l). For unstable heterologous 
proteins with selectable phenotypes, it should be possible to access such mutations by 
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combinatorial mutagenesis, selecting for restoration of the phenotype. For proteins which do 
not have selectable phenotypes, there is currently no reliable method to select for mutations 
which accelerate folding in a heterologous host. However, our demonstration of a tight 
correlation between the folding rate of proteins of interest and the strength of an artificially- 
5 linked phenotype, now makes it possible to identify such mutations in proteins of interest 

which lack selectable phenotypes. Often, over-expression of a misfolding protein as a fusion 
with an otherwise stable protein will improve the soluble yield of the misfolder. However, 
this works best when the stable partner is N-terminal, and has a chance to fold before the 
misfolder can aggregate or turn over. When the stable partner is C-terminal, as in our system, 

U) aggregation or proteolysis of the misfolder can commence before the stable parmer can fold. 
In bacteria, the N-ierminal stable partner may sterically hinder aggregation of the misfolding 
partner, and the local concentration of nascent misfolder is reduced by a factor approximately 
equal to I minus its proportion of the total molecular weight. Nevertheless, the strength of the 
selectable phenotype is still proportional to the solubility of the ftision protein, which in turn is 

15 limited by the aggregation rate of the slowest folding component. 

We have shown that after random mutagenesis of a protein of interest using a low 
mutational operator, such as error-prone PGR (Cadwell and Joyce, in PGR Primer A 
Laboratory Manual , Dieffenbach and Dveksler (eds.) (1995) Cold Spring Harbor Press, Cold 
Spring Harbor, NY, pp. 583-590), DNA shuffling (Crameri et al.. Nature Biotechnology 

20 (1996) 14:315-9), random-priming recombination (RPR), (Shao et aL, Nucleic Acids Research 
(1998) 26:681-3), or the staggered extension process (StEP), (Zhao et al.. Nature 
Biotechnology {\99S) 16:258-261) faster-folding variants can be selected by screening for 
higher levels of the selectable phenotype. Furthermore, as variants are selected which fold 
faster than the marker, the marker folding rate becomes limiting because even stable proteins 

25 will aggregate to some extent when over-expressed. However, as explained above, the marker 
is generally less prone to aggregation as the fusion than alone, so the maximum phenotype 
level produced by fusion of the marker with fast-folding variants may be even higher than that 
produced by the marker alone. Since the solubility of the ftision protein may become limited 
by the folding rate of the marker domain, the solubility of the optimized protein of interest 

30 may be even higher when expressed alone, without the marker fusion domain. Also, we have 
found that substantial improvements in expression can be achieved with single mutations, even 



p. 
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for proteins which already express well. We attribute this to the fact that rate-limiting 
intermediates may be readily destabilized by single mutations. This means that in mutagenic 
libraries with mutation frequencies on the order of one per molecule, the frequency of faster 
folders may be greater than -one-tenth of the inverse of the chain length, or -one in 2500 
for a 250-residue protein. Thus, large libraries are not needed to fmd high-expressing variants 
of poor expressors. The fact that folding can be optimized with few mutations also minimizes 
the likelihood of introducing immunogenic epitopes into therapeutic proteins. 

Optimization of bacterial expression of pharmaceutical/industrial proteins. 

The industrial and pharmaceutical utility of many proteins is limited by prohibitive 
production costs, due to the difficulty of producing stable, functional protein in quantity. Most 
such proteins are not abundant either in their native sources, or in heterologous hosts. 
Combinatorial optimization of the expression of most such proteins has previously been limited 
to those which confer selectable phenotypes on the production host. However, with the subject 
invention it is now possible to optimize the expression of any protein in any heterologous host 
by muiagenizing the protein and expressing the mutant library as a C-terminal fusion with a 
selectable marker. Mutations which accelerate folding are selected on the basis of the strength 
of the marker phenotype. Selected clones should be enriched for native activity such that only 
a modest number will have to be screened to recover the desired activity. The benefits of 

optimized expression are manifold. Not only are yields increased and production and 
purification costs lowered, but higher levels of purity are often possible when the desired 
product is a higher proportion of the starting material. 

Since proteins have evolved to fold only as efficiently as necessary in their native 
environments, it is reasonable to expect that most proteins could be mutated to fold more 
efficiently. The absolute limits of folding efficiency in vivo are not known, but with the subject 
invention, it may be possible to test those limits. First, selectable markers can be optimized by 
mutagenesis and selection for maximum strength of phenotype. If such folding-optimized 
markers have any remaining tendency to aggregate when over-expressed, it will be even 
further reduced when they are expressed as fusions to mutagenized proteins of interest. Thus, 
folding-optimized markers should place no limit on the optimization of proteins of interest. 
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This could allow valuable proteins to be produced in higher yields with higher activities and 
purity than previously possible. Also, fast-folding variants are likely to be more 
thermodynamically stable as well, since recent experimental (Plaxco et al., J. MoL BioL 
(1997) 270:763-70; Mines et al., Chem. BioL (1996) 3:491-7) and theoretical (Gutin et al., 

5 Proc. NatL Acad. ScL USA (1995) 92:1282-6; Wolynes et aL, Chem, BioL (1996) 3:425-32) 
work suggests that folding rates are closely correlated with stability of the native state. This is 
not unexpected if mutations which accelerate folding by destabilizing off-pathway 
intermediates also stabilize the native conformation by reducing the ensemble of kinetically- 
accessible alternatives. Thus, it will be important to compare the thermal stabilities of fast- 

10 folding variants and their wild-type precursors. 

There are two types of stability in proteins. The first relates to tolerance of extreme 
conditions, and the second relates to half-life under favorable conditions. They are not 
necessarily mutually inclusive. The reason for this is that activity is often lost reversibly 
before it is lost irreversibly, but the reverse is not possible. In fact, if loss of activity under 

15 extreme conditions were entirely reversible, it would have little to do with the half-life of the 
protein, which is primarily a function of the rate of irreversible aggregation. Each trait is 
potentially valuable for industrial proteins. Proteins which work better under extreme 
conditions, and/or last longer will fetch a premium on the market, in addition to savings 
realized from reduced production costs, and possible premiums for higher purity. Folding 

20 optimization selects primarily for reduced tendency to aggregate. Since aggregation is the 

principal route of irreversible inactivation, this should prolong the half-life. So long as this is 
accomplished by destabilizing aggregation-prone intermediates without undue effect on the 
enthalpy or entropy of the ground state, reversible stability should not be adversely affected. 

25 Efficient searching of protein libraries in vivo will require preselection for stability. 

Current efforts to accelerate the discovery and validation of new therapeutic and 
diagnostic targets and reagents to meet growing health care and pharmaceutical industry 
demands depend heavily on continuing development of new and improved recombinant DNA- 
based protein engineering methods. For example, to realize the potential of genomics for the 

30 identification of new therapeutic targets, high-throughput methods for the functional analysis 
of expressed sequences of interest will be required. One important approach to rapid 
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functional analysis will be to use protein-protein interaction traps to identify networks of 
interactions within and between the proteomes of human ceils, tissues, and pathogenic 
organisms, using cDNA expression libraries. However, current methods for the construction 
of cDN A libraries for fusion expression, as required for interaction trapping are so inefficient 
that recovery of only the most abundant and robust ligands can be expected. The vast majority 
of cDNA sequences in such libraries are not stably expressed, either because the reading frame 
is incorrect, or because the encoded fragment is not in register with a foldable domain. 

Current recombinant DNA methods allow the construction of cDNA libraries in 
bacteria containing up to 10'' independent clones, or > 10^ times the average number of 
expressed genes in mammalian tissues. If the frequencies of the rarest genes are assumed to 
be in the 10** range, and on average only one in a hundred clones of a gene make a stable 
interaction-competent product, there would still be ten such clones for each rare expressed 
sequence in a library of 10^ Thus, the initial size of the library is not theoretically limiting. 
What limits recovery from these libraries with respect to the expression host is the 
transformation efficiency, and with respect to the screen recovery is limited by throughput and 
signal-io-noise ratio. In yeast, libraries are limited by transformation efficiency to - LO^-iO' 
clones. In mammalian cells the limit is - 10^-10\ Thus, comprehensive searching of such 
libraries in eukaryotic hosts will require not only enrichment for stable expressors, but also 
normalization to bring the frequencies of rare expressors within range of the library size 
limits. 

Throughput is limited by the ability to detect positive clones in the presence of an 
excess of negative clones. Clone-by-clone screens have the lowest throughput, color screens 
have intermediate throughputs, and viability screens generally have the highest throughputs. 
However, even when neither transformation nor screening is limiting, as with biopanning or 
viability selection in a bacterial expression host, recovery is still limited by the "needle-in-a- 
haystack" problem, whereby the discriminating power of the screen, or "signal-to-noise" ratio 
determines the minimum product of frequency and affinity which can be selectively enriched 
above background. Thus, if on average, only one randomly-primed cDNA fragment in a 
hundred expresses a stable, functional domain in the heterologous host (a conservative 
estimate), then the frequencies of all such clones could rise by up to two orders of magnitude 
if a way could be found to eliminate the non-expressors from the libraries before screening. 
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Such an enrichment could make the critical difference for recovery of many important 
interactors. 

Relevent Literature 

Albano et al. . Biotechnol Prog (1998) 14:35 1-4 describes the use of the correlation 
between the fluorescent intensity of the reponer GFP and the functional activity of a protein to 
which it is fused. Similariy. Waldo et al.. Nature Biotechnology, (199) 17:691-695, describes 
the use of GFP to predict solubility of a protein of interest by expressing it as a chimeric with 
the protein of interest. See also PCT/US98/25862. 



SUMMARY 



Methods are provided for obtaining host cells expressing a mutant of a desired protein 
optimized for expression in the host cells, for obtaining a protein with enhanced stability as 
compared to a wild type of the desired protein, and for identifying peptides that can 
stabilize an unstable protein, in each case by expressing die protein linked to a selector 
protein that confers a selectable phenotype on the host cell. In the method for identifying 
stabilizing peptides, the unstable protein is coexpressed with members of a random peptide 
library. To obtain oprimized expression of a protein in a host cell, the method includes the 
steps of preparing a library of mutagenized coding sequences for the protein of interest, 
purifying the members of d>e library of muugenized coding sequences, ligating each 
member of the library into an expression cassette in frame with the coding sequence for a 
selector protein, transforming a multiplicity of host cells with the expression cassettes, 
growing the resulting transformed host cells under conditions for which the selector protein 
confers ability for the transformed cells to grow to produce the mutant proteins joined to 
the selector protein, identifying cells that express mutant proteins at a selective pressure 
higher than that of cells expressing an unmuugenized protein. Proteins with enhanced 
subility can be obtained by cleavage from the selector protein, or expressing the mutant 
protein as a fi-ee protein in the host cells for which it is optimized. The invention finds use 
for example in optimizing mammalian peptides for improved expression in prokaryotic 
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cells and for identifying peptides that can be used for treating diseases that are 
characterized by production of an unstable variant of a wild type protein. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 . An illustration of the alternative fates for nascent proteins when expressed in 
cells at normal levels. When overexpressed, aggregation is an additional fate (not shown). 
DnaK and DnaJ are bacterial Hsp70 and Hsp40 proteins, respectively. GroEL is the bacterial 
Hsp60 complex, and GroES is the companion HsplO complex. For 2-domain proteins like 
GFP-CAT, we hypothesize that misfolding of a single domain leads to turnover of the entire 
protein. 

Figure 2. Expression construct for GFP-CAT fusions. T7prom, phage T7 promoter; 
(G4S)3, flexible spacer between the GFP and CAT domains; His6, hexa-histidine tail for 
affinity purification; T7t» phage T7 transcription terminator; ori, origin of replication; bla, 
ampiciilin resistance. Arrow denotes start of translation. 

Figure 3. Chloramphenicol resistance of £. coli NovaBlue DE3 cells expressing CAT, 
wtGFP-CAT, and GFPuv-CAT. Cells expressing each construct were plated at 1000 cells per 
plate onto solid LB medium containing 0.02 mM IPTG and icreasing concentrations of 
chloramphenicol. After overnight growth at 37° C colonies per plate were scored and plotted 
against cam concentration. 

Figure 4. Correlation of chloramphenicol resistance with fluorescence intensity in cells 
expressing mutagenized GFP as the GFP-CAT fusion. Mutant library transformants were 
seeded at 1000 per plate on increasing concentrations of cam. The percentage of colonies 
fluorescing brighter than wtGFP-CAT was determined visually and plotted against cam 
concentration. 

Figure 5. Fluorescence emission spectra for cells expressing four GFP-CAT 
constructs. GFP-CAT expression was induced during log phase growth in suspension, after 
which the cells were washed and adjusted to a density of O.D.eoo = 0. 1. Emission spectra 
were then taken at an excitation wavelength of 390 nm. 

Figure 6. Selection of protein-stabilizing peptides from random peptide libraries (RPL) 
using the Fold Selector system. Figure A. Genes for unstable extra-cellular proteins of choice 
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(p.o.c.)* such as amyloid p protein (Ap), fused to the N-terminus of p-lactamase via a flexible 
linker, (G4S)3, may be transcribed from the trp4ac fusion promoter {trc prom) in a pi5A 
replicon (pl5A ori) with kanamycin resistance {kan) for plasmid retention. N-terminal signal 
peptides (SP) on this and the RPL fusion protein allow export of the gene products to the E. 
coll periplasm. The RPL genes, encoding random peptides fused to the N-terminus of 
thioredoxin via a GaS linker, may be transcribed from the lac promoter in a pUC phagemid 
with chloramphenicol resistance {cat) for plasmid retention. The phagemid origin of 
replication (fl ori) allows the RPL construct to be packaged in phage and quantitatively 
introduced into cells expressing the unstable protein by infection at high multiplicity (m.o.i.). 
Peptides are selected by their ability to stabilize the p.o.c. and thereby confer growth on non- 
permissive antibiotic concentrations. Figure B. Expression of unstable intra-ceiluJar proteins 
is similar except that chloramphenicol acetyl transferase (CAT) is used as the C-terminal 
fusion to allow selection of stabilizing peptides for chloramphenicol resistance. Also, SPs are 
eliminated to allow retention of expressed proteins in the cytoplasm, and ampicillin resistance 
{amp) is used for pl5A plasmid retention. 

BRIEF DESCRIPTION OF THE SPECIFIC EMBODIMENTS 

Methods for obtaining a protein of interest that is optimized for expression in a host 
ceil, particularly a prokaryotic ceil to which the protein of interest is heterologous are 
provided. To obtain a protein that is optimized for expression in a particular host cell, such as 

coli, members of a library of mutagenized coding sequences for the protein of interest 
joined to the coding sequence for a selector protein are transformed into the host cells which 
are then grown under conditions for which the selector protein confers ability for the 
transformed cells to grow. Generally the coding sequence for the chimeric protein contains a 
coding sequence for a linker peptide between the mutangenized library member and the coding 
sequence for the selector protein; prefereably the linker is a flexible linker. The selector 
protein can be any protein that provides for a selectable phenotype, such as antibiotic 
resistance. The protein of interest need not have a selectable phenotype. Cells that express 
the mutant proteins joined to the selector protein at a selective pressure that is nonpermissive 
for host cells expressing an unmutagenized protein are those that contain a protein optimized 
for expression in the host cell. The mutant proteins can be further screened to identify those 
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that have retained one or more wild type function, and also can be screened to identify those 
that have one or more altered characteristic » such as increased solubility, increased half life 
and decreased temperature sensitivity. 

Also provided are methods for screening for peptides, particularly peptides of from 

5 about 3 to 20» generally of about 12 or less amino acids, that can stabilize unstable proteins 
such as those associated with particular disease states. A chimeric protein that includes the 
defective protein and a selector protein is coexpressed with a member of a tethered random 
peptide library in a host cell grown under selective conditions. The members of the library are 
each a linear chain of about 3 to 20 amino acids, preferably a linear chain of 12 or less amino 

10 acids, fused via a flexible linker to a stable carrier. Growth of cells under selective conditions 
is indicative of cells that contain a peptide which stabilizes the defective protein, which can 
then be identified and screened to assess whether it can also stabilize a free defective protein. 

Description of the Invention 

15 What is demonstrated is the feasibility of using a surrogate marker domain is 

demonstrated in a two-domain fusion with a protein of interest in £. coli to select mutagenic 
variants of die protein of interest with improved expression as a function of accelerated folding 
kinetics. We accomplished this goal by demonstrating a high degree of correlation between 
the functional stabilities of a selectable phenotype, chloramphenicol resistance, and a test 

20 protein, GFP, in E. coli. Chloramphenicol resistance is conferred by chloramphenicol acetyl 
transferase (CAT), which has been stably and functionally expressed as both N- and C- 
terminal fusions with many heterologous proteins (e.g., Dekeyzer ei ai,. Protein Engineering 
(1994) 7:125-130; Zelazny and Bibi, Biochemistry (1996) 35:10872-10878). Functional GFP 
and folding variants thereof (Stemmer, Proc, NatL Acad. Sci, USA (1994b) 91:10747-10751) 

25 emit a bright green fluorescence in blue or uv light. The fluorescent chromophore of GFP is 
formed autocatalytically in the folded protein by cyclization of the peptide backbone of Ser65, 
Tyr66, and GIy67 (Cubitt et aL, Trends inBiochem, Sci, (1995) 20:448-455). GFP has also 
been stably and functionally expressed as both N- and C-terminal fusions with many 
heterologous proteins (Cubiti et aL, Trends inBiochem, Sci, (1995) 20:448-455). 

•^f> Previously, we had attempted to isolate spectral variants of GFP from mutagenic 

libraries for use in a fluorescence energy transfer-based protein-protein interaction trap 
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(Delagrave et al,. Bio/Technology (1995) 13:151-154; MiiraetaL. Gene(l996) 173:13-17). 
In the course of that work, many non-fluorescent GFP variants were observed during mutant 
library screening. Western blot analyses revealed that for most of these variants, soluble, 
recombinant GFP protein failed to accumulate in cells under conditions in which soluble wild 
type GFP was readily detectable (Mitra and Balint, unpublished). However, in most cases 
ftiU-length GFP protein could be recovered from the insoluble fraction of both mutant and wt 
GFP-expressing cells in comparable amounts. These observations suggested that fluorescence- 
null GFP variants arise from destabilizing siructuraJ or folding mutations more frequently than 
from active site mutations, i.e., mutations which inhibit chromophore formation but not 
folding. This is consistent with the fact that in most proteins, active sites are smaller 
mutational targets than structure-determining sites, and this can be estimated for GFP from the 
known x-ray crystal structures (Ormo et aL, Science (1996) 273: 1392-1395; Yang et aL, 

Nature Biotechnology (1996) 14:1246-1251). The availability of x-ray structures and known 
folding mutations in a protein which is otherwise stably and functionally expressed in E. coli, 
make GFP an usefiil tool for testing the concept that the correlation of the stability of one 
domain with the activity of another domain in a multi-domain protein can be used to isolate 
stable variants of proteins which do not have selectable phenoiypes. While the invention is 
exemplified with GFP as the protein to be stabilized, and neomycin as the selectable marker, 
any protein which is desired to stabilize can be substituted for GFP and any selectable marker 
can be substituted for neomycin. Likewise, while the invention has been exemplified in E. 
coli, any other host cell of interest either prokaryotic or eukaryotic, can be substituted for E. 
coli. 

The following examples are offered by way of illustration of the present invention, not 
limitation. 
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EXAMPLES 
Example 1 

Correlation of functional GFP with chloramphenicol resistance for wild-type GFP 
and a fast-folding variant expressed as C-tenninal fusions with CAT in E. colL 
The coding sequences for wild-type (wt) GFP and a highly-expressing variant of GFP 

(GFPuv), (Crameri et aL, Nature Biotechnology (1996) 14:315-9) were inserted into the 
pET23a vector (Novagen, Inc.) between Nhel and BamHI (see Figure 2). pET23a is an 
ampicillin-resisiant pBR322 derivative in which transcripdon of inserted coding sequences is 
controJied by the bacteriophage T7 promoter and transcription terminator (Moffatt and Studier, 
J, MoL BioL (1986) 189: 1 13-130). Expression is restricted to hosts, such as NovaBlue (DE3) 
(Novagen. Inc.), which have been transformed to express the T7 RNA polymerase. GFP 
fluorescence can be readily observed in colonies of these cells harboring the pET-GFP 
construct by illuminating with long-wave uv light. The spectrum, quanuim yield, and 
extinction coefficient of GFPuv do not differ appreciably from wtGFP, consistent with a 
difference of only three of 238 amino acids (Crameri et ai,. Nature Biotechnology (1996) 
14:315-9). However, when expressed from identical constructs in E. coli cells GFPuv 
produces 30-45 times more steady state fluorescence than does wtGFP. Since the specific 
fluorescence of both proteins is comparable, it may be concluded that the higher fluorescence 
intensity of cells expressing uvGFP is due to a comparable increment in the steady state 
amount of soluble, funcuonal uvGFP protein. This is supponed by data indicating that the 
total amount of GFP protein in the cells is comparable for both, but that proportionally more 
GFPuv is present in the soluble pool. Thus, the mutations in GFPuv appear to have increased 
its steady-state activity in E. coli by specifically reducing its tendency to aggregate, 
presumably as a result of an increased folding rate. 

To use a surrogate marker to select for more stable variants of a protein of interest, die 
selectable marker coding sequence must be inserted downstream from diat of the protein of 
interest to insure that selection is not favored by premature termination of the protein of 
interest. Thus, the CAT coding sequence was inserted into the Xhol site of pET23a in the 
same reading frame as die upstream GFPs. Between the two a 15-residue flexible, hydrophilic 
linker, (Gly4Ser)3, was encoded with convenient restricuon sites for facile replacement of both 
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GFP and CAT sequences. The CAT sequence terminates in a His6 tail for facile purification. 
This construct, pET23a-GFP-CAT is shown in Figure 2. 

Table I and Figure 3 show the results of comparisons of the chloramphenicol resistance 
and fluorescence characteristics of wtGFP and GFPuv expressed alone and as C-terminai 
5 fusions with CAT from pET23a in E. coli strain BL21(DE3). When expressed alone, GFPuv 
produces - 30 times more steady state fluoresence than wtGFP, as determined by fluorometry 
of suspensions of equal numbers of cells fi-om overnight growth on solid medium. Maximum 
transcription normally requires induction of T7 polymerase expression with IPTG, but a low 
level of transcription occurs even in the absence of IPTG. Interestingly, induction of wtGFP 

H) by IPTG produces no detectable increment in fluorescence over the uninduced level, though 
the level of wtGFP protein is considerably higher, as determined by gel electrophoresis. This 
suggests that at the uninduced expression level, aggregation is minimal, probably due to sub- 
threshold concentrations of nascent GFP. This is supported by the fact that under uninduced 
conditions, GFPuv fluorescence is coniparable to that produced by wtGFP, which would be 

15 expected if wtGFP fluorescence is not limited by aggregation. At the induced expression 
level, however, substantial insoluble material forms in the wtGFP-expressing cells, 
presumably due to much higher nascent wiGFP concenu-ations. Under the same conditions 
GFPuv produces -30-foId higher fluorescence intensity. From this we conclude that GFPuv 
is much less prone to aggregation at similar expression levels, i.e., nascent protein 

20 concentrations, presumably due to a higher folding rate. 



.30 



wo 01729225 



PCT/US00/0S477 



17 

Table L Comparison of Relative Steady-State Fluorescence Intensity and 
Chloramphenicol Resistance (CamO for CAT Fusions of wtGFP and GFPuv in E. colL 



Expression Product 


IPTG 


Cam^= 


Fluorescence ^ 


WtGFP 


0 


NA 






0.02 mM 


NA 


Ix 


GFPuv (3 mutations) 


0 


NA 


Ix 




0.02 mM 


NA 


30x 


CAT 


0 


34 jxg/ml 


NA 


0.02 mM 


306 \xg/m\ 


NA 


wtGFP-CAT 


0 


34 |iig/ml 


Ix 




0.02 mM 


238 ^g/ml 


2x 


GFPuv-CAT 


0 


34 jig/ml 


Ix 




0.02 mM 


340 \xg/m\ 


4x 



a. Chloramphenicol resistance (Cam*^) was determined as the highest concentration 
in solid LB medium on which at least 50% of cells plated formed visible 
colonies after overnight growth. 



b. Fluorescence was determined by fluorometry at Xexcite = 395 nm and ;iemit ~ 
508 nm of cell suspensions of 0. 1 00600 after overnight growth on solid LB 
medium. 

Interestingly, when induced, cells expressing CAT alone were resistant to less 
chloramphenicol than cells expressing GFPuv-CAT. This suggests that GFPuv, derived from 
a jellyfish protein adapted to life at - 13°C, is less prone to aggregation when over-expressed 
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in bacteria at 37°C than is CAT, a native bacterial protein. Consistent with this, cells 
expressing induced GFPuv-CAT are much less fluorescent than cells expressing GFPuv alone, 
suggesting that the stability of GFPuv-CAT is limited by the stability of CAT. The 
improvement in CAT expression in GFPuv-CAT is probiably due to its reduced concentration 
5 in nascent protein (CAT is less than half the size of GFPuv-CAT and their synthesis rates 

should be similar), and/or GFPuv may sterically interfere with CAT aggregation. On the other 
hand, wtGFP appears predictably to be less stable than CAT, since induced wtGFP-CAT 
confers less chloramphenicol resistance than CAT alone, but more fluorescence than wtGFP 
alone. In general, SDS PAGE analyses of soluble and total extracts were consistent with the 
10 fluoresence and chloramphenicol resistance phenotypes, i.e., higher soluble protein levels 

correlated with higher fluorescence and higher cam resistance, though total protein remained 
more or less constant. 

Example 2 

15 

Isolation of super-stable variants of OFF from mutagenic library expressed as 

fusion with CAT and selected for increased chloramphenicol resistance. 

The GFP coding sequence may be subjected to random mutagenesis by any of several 

methods, including error-prone PCR (Cadwell and Joyce, in PCR Primer A Laboratory 

20 Manual , Dieffenbach and Dveksler (eds.) (1995) Cold Spring Harbor Press, Cold Spring 

Harbor, NY, pp. 583-590), DNA shuffling (1994a,b), random-priming recombination (RPR), 

(Shao et al,. Nucleic Adds Research (1998) 26:681-3), or the staggered extension process 

(StEP), (Zhao et aL, Nature Biotechnology (1998) 16:258-261). The wtGFP coding sequence 

may be amplified from pET-wtGFP using primers containing the Ndel site at the translation 

25 

start, and at the unique EcoRI site just beyond the GFP C-terminus in pET-GFP-CAT. The 
error-prone amplification reaction is carried out in the presence of excess Mg^'*^ and excess 
deoxynucleoside triphosphates to encourage mis-incorporation. An excess of primers may also 
be used, since the final mutation frequency is proportional to the number of cycles, so long as 
.^0 the primers are not exhausted. Under standard error-prone conditions (Cadwell and Joyce, in 
PCR Primer A Laboratory Manual , Dieffenbach and Dveksler (eds.) (1995) Cold Spring 
Harbor Press, Cold Spring Harbor, NY, pp. 583-590), a mutation frequency of -0.7% is 
produced in the final product after 25-30 cycles. Since 75% of coding mutations produce 
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coding changes, we would expect ^ 3-4 amino acid substitutions per GFP clone from this 
protocol. The GFP coding sequence in plasmid pGFP (Clontech Laboratories, Inc.) was used 
as template for error-prone amplification using the end-specific primers 
AGCAGTCGCTTCACGTTCGCTCGC and GCATTCATCAGGCGGGCAAGAATG. The 

5 template GFP sequence was that reported by Chalfie et al, (1994) with the following changes: 
the staning triplet MGK has been replaced by MASK derived from the pET23a multiple 
cloning site, and the final SG has been replaced by NS. The same changes were also 
introduced into the GFPuv sequence (Clontech Laboratories, Inc.). Additionally, a Q80R 
mutation derived from a PGR error has been retained. The error-prone PGR product was gel- 

10 purified, and ligated back into the GFP-CAT fusion expression construct shown in Figure 2. 
The ligation product was then introduced into cells of E. coii strain NovaBlue (DE3) by high- 
voltage electroporation. Transformants were then plated onto solid Luria-Bertani medium 
containing increasing amounts of chloramphenicol, ranging from 34 |xg/ml to 544 |ig/ml, and 
incubated at 37*'C overnight. 

15 An initial assessment of the correlation of cam resistance with fluorescence intensity 

was made by a visual estimation of the percentage of colonies which fluoresced more intensely 
than wtGFP-CAT as a function of cam concentration. The results are illustrated in Figure 4. 
As the concentration of cam increased, the frequency of brighter colonies also increased. At 
cam concentrations above 270 ^g/ml, the probability of visually identifying a brighter colony 

20 rose above 10%. When these brighter clones were restreaked, they invariably remained 

brighter than wtGFP-CAT-expressing cells. Interestingly, when colonies resistant to high cam 
were replated onto 34 jig/ml cam, the percentage of brighter colonies rose to -30%. Thus, 
many clones expressing brighter GFP variants were masked by the high concentrations of cam 
used for selection which probably inhibited protein synthesis somewhat. 

25 From the first round of mutagenesis colonies were recovered from cam concentrations 

of up to 408 p.g/ml, exceeding the cam resistance of the GFPuv-CAT construct. For the 
purpose of phenotypic screening, ten clones were picked from plates containing between 306 
and 408 ng/ml cam. These clones were selected solely for their ability to grow on cam 
concentrations which were non-permissive for the parental wtGFP-CAT construct. In ambient 

.^0 room light GFP fluorescence was not visible. For a second round of mutagenesis forty clones 
from the first round were pooled in ambient light from cam concentrations which were non- 
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permissive for wtGFP-CAT. Plasmid DNA was purified from the pooled clones and used as 
template for mutagenesis by the staggered extension process (StEP) for in vitro recombination 
of mutations selected from the first round (Zhao er aL, Nature Biotechnology (1998) 16:258- 
261). This method employs a template switch recombination mechanism, in which a short 
extension rime is used to allow only partial replication of the sequence during each cycle. 
Thus, each full-length copy is generated over several cycles with the template being switched 
between each cycle. The second round product was ligated back into the vector as before and 
plated onto the same range of cam concentrations as used for the first round. Colonies were 
observed on cam concentrations of up to 510 ng/ml. Again, for phenotypic screening, 10 
clones were picked in ambient room light from plates containing between 306 and 510 ^g/ml 
cam. 

The 20 total clones selected from cam plates in rounds one and two were grown in 
suspension in 5 ml of LB containing either 100 jig/ml ampicillin or 34 ^g/ml chloramphenicol. 
At an O.D600 of 0.4, protein expression was induced with 0.4 mM IPTG. After 16 hours of 
overnight growth, cells were washed and resuspended in phosphate buffered saline (PBS) at 
OD600 of 0. 1 , and the whole cell fluorescence was measured by excitation at 395 nm and 
emission at 508 nm. Fluorescence emission spectra were then determined for the brightest 
mutants from each round, designated GFPRl-CAT and GFPR2-CAT respectively. Each had 
an emission maximum at 5 10 nm when excited at 390 nm. The emission spectra of these 
clones are compared to those of wtGFP-CAT and GFPuv-CAT in Figure 5, To assess the 
effect of the CAT gene on GFP expression, stop codons were inserted at the C-termini of the 
coding sequences of GFPwt, GFPuv, GFPRl, and GFPR2 in the GFP-CAT fusion constructs 
to allow expression of the free GFPs. The relative fluorescence intensiries for GFP expression 
with and without CAT are summarized in Table II. 
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Table II. Relative Fluorescence Intensities of GFP Constructs 



CONSTRUCT 


Relative Emission 




Intensitv at 510 nm 


GFPwt-CAT 


1 


GFPuv-CAT 


4 


GFPRl-CAT 


14 


GFPR2-CAT 


56 


GFPwt 


i 


GFPuv 


30 


GFPRl 


40 


GFPR2 


40 



15 

The brightest mutant from round one, GFPRI-CAT, was 14 times brighter than 
wtGFP-CAT and 3:5 times brighter than GFPuv-CAT. GFPRl-CAT was also resistant to at 
least 408 \ig/m\ cam, whereas GFPuv-CAT could resist only 340 ^ig/ml. The brightest mutant 
from round two, GFPR2-CAT, was 56 times brighter than GFPwt and 14 times brighter than 

20 GFPuv-CAT, and could grow in 510 \ig/m\ cam. Interestingly, die dramanc increases in 
expression seen with GFPRl-CAT and with GFPR2-CAT over GFPuv-CAT essentially 
vanished when CAT was removed. All three expressed at comparable levels, 30-40 times that 
of wtGFP. Whereas GFPuv expression was inhibited by fusion to CAT by 7-8-fold, 
expression of GFPRl was inhibited only - 3-fold, and expression of GFPR2 was actually 

25 enhanced slightly by fusion to CAT. Thus, not only were mutations selected which enhanced 
expression of the free GFP protein, but mutations were also selected which sp)ecifically 
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enhanced expression of GFP as fusions with other proteins. The reduced expression of 
GFPuv-CAT relative to free GFPuv is probably not due to dominant weaker expression of 
CAT because CAT does not have the same effect on GFPR2. Radier the reduced expression 
of GFPuv-CAT is probably due to mutual steric interference with the folding of the two 
5 proteins, and the same is probably true to a lesser extent for GFPRl-CAT. SDS-PAGE 

confirmed that the GFPRl-CAT, GFPR2-CAT, GFPwt-CAT, and GFPuv-CAT all comprised 
over 50% of the total cell protein. The increase in brighmess for the GFPRl-CAT and 
GFPR2-CAT mutants over that of wtGFP-CAT and GFPuv-CAT is reflected by the difference 
in the amount of protein in the soluble fraction. For example, about 25% of GFPR2-CAT 

10 protein is soluble, whereas only about 1-2% of wtGFP-CAT protein was soluble. 

DNA sequences were determined for the entire open reading frames of GFPRl and 
GFPR2, and compared to those of wtGFP and GFPuv (see Table III). In addition, the 
reported mutations in GFPuv were confirmed. Surprisingly, one mutation was shared by all 
three improved GFPs, V164A. Even more surprisingly, this was the only mutation present in 

15 GFPRl. Since GFPRl expresses as well or better than GFPuv as the free protein, this 
suggests the other mutations in GFPuv are not necessary. GFPuv had originally been 
"evolved** by repeated rounds of recombinatorial mutagenesis by DNA shuffling and 
phenoiypic selection, followed by back-crossing to eliminate deleterious mutations (Crameri et 
ai,. Nature Biotechnology (1996) 14:315-9). However, it appears that only one of the three 

20 remaining mutations is actually required for the complete phenotype. Thus, not only was 

recombination unnecessary, but the required mutation could have been recovered easily from a 
few thousand clones of a standard Tag polymerase amplification of the wtGFP coding 
sequence. Under standard conditions —25 cycles of PCR with a non-proofreading polymerase 
such as Tag produces one mutation per - 700 bp, which is roughly the size of the GFP coding 

25 sequence. Since only a single nucleotide change with a frequency of 1/3 is required for the 
V164A mutation, the expected frequency would be one in only -2100 clones. 
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Table III. Sequence Comparison of Three High-Expressing GFP Variants and wtGFP 



Amino Acid Residue 


GFPwt 


GFPuv 


GFPRl 


GFPR2 


100 


TTT{F) 


TCT(S) 


TTT(F) 


TTT(F) 


105 


AAC (N) 


AAC (N) 


AAC(N) 


AGC (S) 


154 


ATG (M) 


ACG (T) 


ATG (M) 


ATG (M) 


164 


GTT(V) 


GCT (A) 


GCT (A) 


GCT (A) 



Since the V164A mutation could account for all of the increase in free GFP expression 
for ail three improved GFPs, we wished to see if any other independently adaptive mutauons 
could be recovered. Before embarking on the arduous task of sequencing a large number of 
additional clones, however, we first examined the eighteen other independent clones selected 
from rounds one and two, which grew on non-permissive cam, for the presence of the V164A 
mutation. This mutation was present in ail eighteen clones. Thus, VI 64 A appeared to be the 
only single-hit mutation capable of destabilizing the aggregation-prone intermediate in GFP 
folding. Indeed, such a mutation would be expected to reduce the hydrophobicity at that 
position, and it is hydrophobicity which would be expected to drive aggregation. Any odier 
independendy adaptive mutation of comparable frequency should have appeared at least once. 
Of course, it is possible, even likely, diat combinations of two or more mutations could have 
had a comparable effect, but their frequency would have been too low to be readily selected 
from our library. 

Neither of die odier two mutations present in GFPuv appeared in any of the twenty 
selected clones, consistent with dieir apparent dispensibilliy for increased cam resistance. 
More revealing, however, is die fact diat GFPRl showed 3.5-fold higher expression as die 
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CAT fusion than GFPuv-CAT. This suggests that at least one of the two other mutations in 
GFPuv is responsible for mutual interference with CAT folding. The fact that GFPRl itself is 
still somewhat inhibited as the CAT fusion relative to its expression as the free protein* 
suggests that the wtGFP sequence is still somewhat inhibited by the folding or presence of 
CAT. The folding of GFPR2, however, was not inhibited at all by the presence of CAT. 
GFPR2 contains only one mutation in addition to V164A, namely N105S. Thus, this mutation 
is apparently responsible for the complete elimination of folding interference between GFP and 
CAT. It is not likely that the combination of mutations in GFPR2 arose by recombination 
because V164A is apparently indispensable. Rather, the combination probably arose by simple 
addition of the N105S mutation to V164A. We have confirmed that the N105S mutation by 
itself is not sufficient to confer a selectable increment in cam resistance on GFP-CAT. 

The ability of the surrogate marker fold selection system to select mutations which 
specifically enhance fusion protein expression in addition to those that enhance independent 
folding is an added benefit for proteins like GFP, which have important applications as fusion 
proteins. One question is whether fold selection in the context of fusion proteins could, on 
occasion, select only mutations which accelerated folding only in the context of the fusion, and 
did not accelerate folding of the free protein. Such mutations would be highly unlikely 
because in addition to protecting the protein of interest from interference by the fusion partner, 
such mutations would also have to in effect convert the fusion partner into a chaperone, for 
which there is no precedent. Mutations which specifically improve fusion expression, like 
GFP-N105S, can only be selected in proteins which already fold independently, like GFPRl. 
In preliminary tests with other fusion panners, GFPR2 has continued to fold independently of 
the fusion partner, neither inhibiting nor being inhibited by it. For example, as a C-terminal 
fusion with neomycin phosphotransferase (GFPR2-NPT) both fluorescence and kanamycin 
resistance were at least as high as those of the free GFPR2 and the free NPT, respectively, 
whereas both functions were inhibited in the GFPuv-NPT fusion. Since GFPuv was reported 
to express at 30-fold higher levels than wtGFP in mammalian cells, it may exhibit the same 
sensitivity to fusion expression in these cells as it does in bacteria. GFPR2 is the subject of US 
Patent Application 60/160,461. 
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Example 3 

Unstable proteins can be stabilized by peptides selected from random peptide 
libraries. 

Many diseases are caused by unstable proteins, which fail to accumulate in biologically 
active form in cells or tissues due to one or more mutations which cause a delay in folding or 
which destabilize the active conformation such that the protein is prone to insoluble 
aggregation and/or proteolysis. There are two main types of unstable proteinopathies: those 
which cause disease by forming toxic insoluble aggregates, and those which cause disease by 
loss of function. The former are represented by amyloidogenic polypeptides such as the 
amyloid p protein (Ap), which forms insoluble amyloid fibrils in the brain (Li ei aL, J, 
Leukocyte BioL (1999) 66:567-74; Cappai and While, Int. J. Biochem. Cell BioL (1999) 
31:885-9). Amyloid deposits can induce chronic inflammation and tissue damage, which are 
major etiologic components of Alzheimer's disease. There is currently much interest in the 
development of chemo-therapeutic strategies to counter the progress of amyloidogenesis and 
resultant tissue degeneration in Alzheimer's and odier amyloidogenic proteinopathies. Drugs 
which could interfere directly with the aggregation of the Ap protein would be highly 
desirable. However, no reliable method currently exists for screening chemical libraries for 
such activities. 

We have demonstrated that some unstable proteins can be stabilized by interaction with 
small peptides obtained from a random sequence library using the "Fold Selector" technology 
described above. It is expected that such peptides may be used therapeutically, or may be used 
as leads for the development of therapeutic agents, which can be used to inhibit, or perhaps 
even reverse the formation of amyloid or other types of toxic insoluble protein deposits. It is 
further expected that similar peptides could be selected for dieir ability to stabilize intra- 
cellular proteins resfwnsible for loss-of-function proteinopathies. Most mutations which lead to 
loss of physiological function do not disable the active site of a protein per se, but rather they 
destabilize the active conformation, or they interfere with the folding pathway so that the 
protein gets trapped in meta-stable intermediates which are prone to aggregation or 
proteolysis. The reason for this is that the target size for structural mutations in a natural (i.e., 
highly evolved) protein is typically much greater than die active site(s). Thus a high 
proportion of inborn errors of metabolism and other genetic disorders are caused by proteins 
which do not remain properly folded and/or do not fold properly to begin widi. Peptides which 
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Stabilize such proteins in vitro could be used to develop eel Impenetrating peptido-mimeiics 
which in turn could be used to restore the missing fiinctions by stabilizing the proteins in vivo. 

Because amyloidogenic proteins such as the Ap peptide do not produce screenable or 
selectable phenotypes, there is no conventional method to select for stabilization of these 
proteins. However, as we have demonstrated, when unstable proteins are expressed as C- 
terminal fusions to a protein with a selectable phenotype, the selectable phenotype is 
destabilized and can be used to select for stabilization of the amyloidogenic protein. Extra- 
cellular amyloidogenic proteins may be expressed in the £. colt periplasm as C-terminal 
fusions to TEM-l p-Iactamase with an intervening flexible linker such as (Gly4Ser)3. TEM-1 
p-lactamase is an E, coli piasmid-bom enzyme which confers resistance to the penicillin class 
of p-Iactam antibiotics (Genbank Accession no. J01749; Sutcliffe, Proc. NatL Acad, Sci. USA 
(1978) 75:3737-3741), The coding sequence for the 42-amino acid Ap peptide (Glenner and 
Wong, Biochem. Biophys, Res, Commun. (1984) 3:885-90; Kang et al,. Nature {1987) 
325:733-736) was sub-cloned for expression in the E, coli periplasm as a fusion to the N- 
terminus of p-Iactamase as illustrated in Figure 6. When expressed in E. coli strain DH5a, 
the tendency of Ap to form insoluble aggregates reduced p-lactamase activity such that the 
cells would not grow on ampicillin concentrations above 200 ^g/ml, whereas, when Ap was 
removed or replaced with a stable domain of comparable size (c-fos), the host cells grew with 
quantitative efficiency (i.e., >0.5 colonies per cell) on ampicillin concentrations up to 800 
^g/ml. Polyacrylamide gel electrophoresis (PAGE) confirmed that proportionally more p- 
lactamase partitioned into the insoluble fraction as the Ap fusion than as either the free enzyme 
or as the c-fos fusion. Thus, under these expression conditions ampicillin resistance could be 
used to select for stabilization of the human Ap protein. 

A random peptide-encoding library (RPL) was constructed using synthetic 
oligonucleotides to encode a chain of 12 randomly selected amino acids at the N-terminus of 
E. coli thioredoxin {tncA; Genbank accession no. M54881) with an intervening flexible linker 
(Gly4Ser). The expression constructs for this library and AP-P-lactamase are illustrated in 
Figure 6. The expression cassette for the RPL-trx fusion library was assembled in a pUC- 
based phagemid (Sambrook et aL, in Molecular Cloninjg A Laboratory Manual , 2~* ed., (1989) 
Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, pp. 4.17-4.19) to allow 
rescue as phage, which could then be used to uansfect the construct quantitatively into cells 
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harboring the AP-P-lactamase expression construct. At least 10* clones of the RPL were 
rescued as filamentous bacteriophage by infection with helper phage M13K07 (Sambrook et 
^^'^ in Molecular Cloning A Laboratory Manual 2"^ ed., (1989) Cold Spring Harbor 
Uboratory Press, Cold Spring Harbor, NY. pp. 4.19-4.50). At least 10' DH5a cells bearing 
the Ap-p-lactamase construct were infected with a 100-fold excess of RPL phage to insure 
quantitative infection. At least 10" independent transfectants were then plated onto solid 
medium containing 400. 600, and 800 ^ig/ml ampicillin. 1 10 colonies were recovered after 
overnight growth on 400 ^ig/ml, 19 were recovered from 600 |ig/ml, and 4 were recovered 
from 800 jig/ml. For negative controls, ten clones were selected at random from die 
unselecied RPL/Ap-P-lactamase co-transformants, and 10" cells of each were plated onto 400, 
600, and 800 ng/ml ampicillin. After overnight growth no colonies appeared for any clone on 
any ampicillin concentration. By contrast most of the selected clones replated onto 400 fig/ml 
ampicillin, 13 replated onto 600 |ig/mi, and 2 replated onto 800 ^lg/ml. Thus, 12-mer 
peptides were selected which substantially reduced the tendency of Ap to form insoluble 
aggregates, as judged by the increased p-lactamase activity. These peptides can be produced 
synthetically and modified by known methods to increase their stability in vivo. It is expected 
that this can be accomplished for at least some of the selected peptides without sacrificing their 
ability to stabilize AP against amyloid formation both in vitro and in vivo. 

Interestingly, when a similar 12-mer RPL was constrained between the disulfide- 
forming cysteines in the active site of thioredoxin, and co-expressed with AP-p-lactamase 
ftision, fewer clones were recovered at 400 jig/ml ampicillin, and none were recovered at 
higher concentrations. It is instructive to consider why such a library might be so much 
poorer a source of stabilizing peptides than the tethered N-terminal library we used. The 
thioredoxin active site-constrained RPL and others like it have been widely used to obtain 
artificial ligands for antibodies, receptors and other proteins of interest (Colas et aL. Nature 
(1996) 380:548-550). The peptides in this RPL are constrained into a closed loop on the 
surface of the protein by the disulfide, and are therefore expected to be somewhat more rigid 
than the N-terminal peptides in our library. Rigidity is important for high-affinity protein- 
protein interactions because it minimizes the entropy cost of binding. However, flexibility 
may be more important for protein stabilization. Conventional protein-protein interactions are 
surface interactions, whereas stabilizing peptides may need to interact with residues which 
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become iternalized in the active conformation. Also, the diversity of constructive interactions 
which can occur between unstable proteins and flexible peptides should be much greater than 
interactions with the surfaces of rigid proteins. Thus, some of the stabilizing peptides may 
extend into the interior of the stabilized Ap protein, and/or may interact with non-contiguous 
regions of the protein. It is even possible that in cases where instability is due to folding 
intermediates, and not to a loss of stability of the active conformation, that stabilizing peptides 
may, in effect, catalyze the folding reaction without remaining structural components of the 
folded protein. 

We also expect that at least some of the Ap-stabilizing peptides will not require all 12 
amino acids for activity, and may be equally active as smaller peptides. We have stabilized 
other proteins with smaller peptides. For example, we have identified several linear tri- 
peptides, which when tethered to a carrier protein can stabilize an unstable fragment of P- 
lactamase. The peptide-stabilized (J-lactamase fragment can then complement a second 
fragment to form active 3-lactamase. We believe that the methods described herein can be 
broadly used to isolate peptides which can stabilize desired proteins, particularly those which 
do not produce screenable or selectable phenotypes. Such methods are not currently available, 
and since there are few if any reports in the literature of protein-stabilizing peptides, it is 
generally not appreciated that unstable proteins can be stabilized at will with appropriate 
peptides, and possibly small molecules derived therefrom. 

The foregoing results suggest a general procedure for selecting protein-stabilizing 
peptides from unconstrained terminal RPLs, which begins with expressing the unstable protein 
of choice as a C-terminal fusion with a flexible hydrophilic linker and a selectable marker in 
the appropriate compartment of E, coli, as illustrated in Figure 6. If the protein of choice is a 
secreted protein, as in the case of the Ap peptide, p-lactamase may be used as the C-terminal 
fusion parmer to allow periplasmic selection for p-lactam antibiotic resistance. A signal 
peptide must be encoded at the N-terminus for expon of the fusion protein to the bacterial 
periplasm. It is preferable to use the pl5A replicon with chloramphenicol resistance, so that 
universal RPLs can be constructed in the pUC phagemid with kanamycin resistance. If the 
protein of choice is cytoplasmic, as in die case of GFP described above, CAT (Genbank 
accession no. X06403; Rose, Nucleic Acids Research (1988) 16:355) may be used as the 
fusion parmer to allow selection for chloramphenicol resistance (Dekeyzer et aL, Protein 
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Engineering (1994) 7: 125-130; Zelazny and Bibi, Biochemistry (1996) 35: 10872-10878). In 
this case it is preferable to use the pl5A repJicon with ampiciliin resistance, so that the 
universal RPLs in the pUC phagemid with kanamycin resistance can be used. 

The first requirement which must be met is that the unstable protein must cause a 
substantial quantitative reduction in the selectable phenotype. This must be quantified and the 
minimum stringency must be established for quantitative selection, as was done for the use of 
ampiciliin resistance to select for stabilization of the AP protein. One or more universal RPLs 
may then be quantitatively introduced into cells expressing the fusion of the desired protein 
with the selector, and the transfectants are then plated onto the minimum concentration of 
antibiotic which is quantitatively non-permissive for growth of the fusion protein. The number 
of independent transformants plated should be equivalent to or greater than the size of the 
RPL, and the minimum non-permissive concentration of antibiotic should allow no colonies to 
grow from the same number of cells expressing the fusion protein alone. The RPL used was a 
l2-mer on the N-terminus of thioredoxin, but the RPL may vary in length from 3 to 20 or 
more residues on either end of the carrier. However, die proportion of unstable peptides in 
the RPL rapidly increases when the length exceeds - 12 amino acids. The carrier may be any 
stable protein which tolerates terminal fusions well. Selected peptides may be verified by co- 
expression with the free protein of choice. A substantial increase in the proportion of the 
protein which partitions into the soluble fraction should be observed in the presence of the 
selected peptide only and not in the presence of a non-selected peptide. 

Example 4 

Unstable proteins can be stabilized by tri-peptides selected from random peptide 
libraries 

The most common cause of instability in proteins is the tendency of folding 
intermediates, which may be accessed either on the folding pathway or by partial unfolding of 
the folded protein, to form inter-molecular associations between structural elements which 
normally associate intra-molecularly in the native conformation. Such inter-molecular 
associations may initiate polymerization reactions which lead to the formation of insoluble 
aggregates such as amyloid fibrils (Dobson, 1999, Trends Biochem, Sci. 24, 329-32). We 
wished to test the hypothesis that small peptides could be selected from random sequence 
libraries for their ability to protect unstable proteins from aggregation. To accomplish this we 
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utilized a fragment complementation system, which we had developed for the enzyme P- 
lactamase. E. coli TEM-I p-lactamase (Sutcliffe, 1978. Proc Natl Acad 5c. USA 75, 3737-41) 
may be separated into two fragments at E197-L198 which can complement to form active 
enzyme with the aid of interacting domains such as hetero-dimerizing helixes which are fused to 
the break-point termini of the fragments (Balint and Her, US Patent Application 60/124,339). 

The activity of the p-lactamase fragment complementation system is limited, however, 
by the stability of the N-terminal fragment, denoted al97. When al97 and the stable C-terminaJ 
fragment, ©198 were co-expressed in the £. coli periplasm as fusions to the hetero-dimerizing 
helixes of the c-fos and c-jun subunits of the transcription factor AP-1 (Karin et al., 1997, Curr 
Opin Cell Biol 9. 240-6), only enough p-lactamase activity was produced to confer a plating 
efficiency of -1% on 50 ^g/ml ampiciliin. However, when the fragment fusions were co- 
expressed with a library of random tri-peptides at the N-terminus of a carrier protein, E. coli 
thioredoxin {trxA: Genbank accession no. M54881) with an intervening Gly4Ser linker, four tri- 
peptides were independently selected which specifically increased p-Iactamase activity to confer 
100% plating efficiency on the host cells. These iri-peptides all turned out to have the same 
sequence, Gly-Arg-Glu (GRE). The GRE tripeptide conferred no resistance to ampiciliin in the 
absence of the interacting helixes, thus it does not stabilize the re-folded fragment complex, but 
rather it must stabilize the al97 fragment since activity is limited by the amount of soluble 
a 197. Since the GRE tri-peptide had the same stabilizing effect on a 1 97 fragment when a 
different carrier was used, its activity must be context independent. Thus, an 18 kDa enzyme 
fragment could be stabilized at least lOO-foId by a tri-peptide selected from a random sequence 
library. 

Interestingly, though the GRE tri-peptide could inhibit aggregation of a 1 97, it apparently 
did not interfere with re-folding of the fragment complex. Since aggregate formation proceeds 
exponentially, it is exquisitely sensitive to small shifts in the inter-molecular association rate 
constants (Dobson, 1999). Thus, even weak binding of an excess of the tri-peptide to the 
interacting surfaces could effectively defeat inter-molecular aggregation. On the other hand, 
cooperative folding of the fragment complex should readily displace the weakly bound iri- 
peptides because the effective inira-molecular concentrations of interacting structural elements 
relative to one another would be much higher than the tri-peptide concentration. In this way the 
general ability of small peptides to stabilize large proteins without interfering with protein 
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folding may be understood. We believe this phenomenon is not widely appreciated, and in fact 
this may be the tlrst demonstration that a functional protein could be deliberately stabilized by 
something as small as a tri-peptide. 

All publications and patent applications mentioned in this specification are indicative of 
the level of skill of those skilled in the art to which this invention pertains. All publications and 
patent applications are herein incorporated by reference to the same extent as if each individual 
publication or patent application was specifically and individually indicated to be incorporate by 
reference. 

The invention now having been fully described, it will be apparent to one of ordinary 
skill in the art that many changes and modifications can be made thereto without depaning from 
the spirit or scope of the appended claims. 
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WHAT IS CLAIMED IS: 

1 . A method for obtaining host cells expressing a mutant of a desired protein optimized for 
expression in said host cells, said method comprising: 

5 expressing a library of mutagenized coding sequences for said desired protein as individual 

fusion proteins in a multiplicity of said host cells grown under selective conditions, wherein 
the coding sequence for each fusion protein comprises a member of said library of 
mutagenized coding sequences operably linked to a coding sequence for a selector protein, 
expression of which confers ability to grow under said selective conditions on said host cells; 
H) and 

identifying host cells that express a fusion protein comprising a mutagenized desired 
protein and a selector protein under selective conditions that are nonpermissive for host cells 
expressing a fusion protein comprising an unmutagenized desired protein and said selector 
protein as indicative of host cells expressing a mutant of said desired protein optimized for 
15 expression in said host cells. 

2. The method according to Claim 1 , wherein said protein is heterologous to said host cell. 

3. The method according to Claim I, wherein said selective conditions are exposure to an 
antibiotic to which said host cells are sensitive in the absence of said selector protein. 

4. The method according to Claim 3, wherein said antibiotic is chloramphenicol. 

20 5. The method according to Claim 3, wherein said antibiotic is a p-Iactam antibiotic. 

6. The method according to Claim 1, wherein said host cells are prokaryotic cells. 

7. The method according to Claim 6, wherein said prokaryotic cells are £. coli. 

8. The method according to Claim 1, wherein said desired protein lacks a selectable 
phenotype. 

25 9. The method according to Claim I, wherein said member of said library of mutagenized 
coding sequences is operably linked to said coding sequence for a selector protein via a 
peptide linker. 

10. The method according to Claim 9, wherein said peptide linker is flexible. 

1 1 . The method according to Claim I , wherein said member of said library of mutagenized 
30 coding sequences is operably linked 5' to said coding sequence for a selector protein. 

12. A method for obtaining E, coli host cells expressing a mutant of a desired protein 
optimized for expression in said cells, said method comprising: expressing a library of 
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mutagenized coding sequences for said desired protein as individual fusion proteins in a 
multiplicity of said host cells grown in the presence of an antibiotic, wherein the coding 
sequence for each fusion protein comprises a member of said library of mutagenized 
coding sequences operably linked to a coding sequence for a selector protein, expression of 
5 which confers ability to grow in the presence of said antibiotic; and 

identifying host cells thai express a fusion protein comprising a mutagenized desired 
protein and a selector protein at a concentration of said antibiotic that is nonpermissive for host 
cells expressing a fusion protein comprising an unmutagenized desired protein and said 
selector protein as indicative of host cells expressing a mutant of said desired protein optimized 
10 for expression in said host cells. 

13. The method according to Claim 12, wherein said antibiotic is chloramphenicol and said 
selector protein is chloramphenicol acetyl tranferase. 

14. The method according to Claim 12, wherein said antibiotic is a P-lactam antibiotic and said 
selector protein a P-lacumase. 

15 15. A method for obtaining a mutant of a desired protein optimized for expression in a host 

cell, wherein said mutant maintains at least one or more functional characteristic of interest 
of a wild type of said desired protein, said method comprising: 

isolating a plurality of mutants of said desired protein from host cells obtained according to 
the method of Claim 1 or Claim 12; and 
20 screening said plurality of mutants for a mutant comprising said functional characteristic of 

interest. 

16. The method according to Claim 15, wherein said functional characteristic of interest of 
said mutant as compared to a wild type protein is one or more characteristic selected from 
the group consisting of enzymatic activity, fluorescence, immunogenicity, solubility, pH 

25 sensitivity, temperature sensitivity and half-life. 

17. The method according to Claim 15, wherein said functional characteristic of interest of 
said mutant as compared to a wild type protein is one or more characteristic selected from 
the group consisting of increased solubility, decreased temperature sensitivity and 
increased half-life. 
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18. A method for obtaining a mutant of green fluorescent protein having at least one altered 
characteristic as compared to a wild type green fluorescent protein, said method 
comprising: 

expressing a library of mutagenized coding sequences for said green fluorescent protein as 
individual ftision proteins in a multiplicity of host cells grown under selective conditions, 
wherein the coding sequence for each ftision protein comprises a member of said library of 
mutagenized coding sequences operably linked to a coding sequence for a selector protein, 
expression of which confers resistance to said selective conditions on said host cells; 

identifying host cells that express a fusion protein comprising a mutagenized green 
fluorescent protein and a selector protein under selective conditions that are nonpermissive for 
host cells expressing a ftision protein comprising an unmutagenized green fluorescent protein 
and said selector protein as indicative of host cells expressing a mutant of said green 
fluorescent protein optimized for expression in said host cells; 

isolating a plurality of mutants of said green fluorescent protein from host cells expressing 
a mutant of said green fluorescent protein optimized for expression in said host cells; and 

screening said plurality of mutants for a mutant having at least one altered characteristic 
whereby a mutant of green fluorescent protein having at least one altered characteristic as 
compared to a wild type green fluorescent protein is obtained. 

19. The method according to Claim 18, wherein said at least one altered characteristic is 
selected from the group consisting of solubility, fluorescence, stability, and absorption 
spectrum. 

20. The method according to Claim 18, wherein said member of said library of mutagenized 
coding sequences is operably linked 5' to said coding sequence for a selector protein. 

21. A nucleic acid comprising a coding region for a green fluorescent protein, wherein said 
nucleic acid comprises a mutated sequence at codon 105. 

22. The nucleic acid according to Claim 21. wherein said mutated sequence at codon 105 is 
AGC. 

23. The nucleic acid according to Claim 21, ftirther comprising a mutated sequence at codon 
165. 

24. The nucleic acid according to Claim 23, wherein said mutated sequence at codon 165 is 
GCT. 
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25. A nucleic acid comprising a coding region for a green fluorescent protein, where said 
green fluorescent protein has a serine instead of an asparagine at residue 105. 

26. The nucleic acid according to Claim 25, wherein said green fluorescent protein further 
comprises an alanine instead of a valine at residue 164. 

27. A cell comprising a nucleic acid according to Claim 25, wherein the codon for said serine 
is a codon preferred by said cell. 

28. The cell according to Claim 26, wherein said cell is an E. coli cell and said codon is AGC. 

29. A cell comprising a nucleic acid according to Claim 22, wherein the codons for said 
alanine and said serine are codons preferred by said cell. 

30. The cell according to Claim 29, wherein said cell is an E. coli cell and said codons are 
AGC and OCT respectively. 

31. A green fluorescent protein comprising a serine instead of an asparagine at residue 105. 

32. The green fluorescent protein according to Claim 31, further comprising a phenylalanine 
instead of a valine at residue 164. 

33. A method for identifying a DNA sequence which encodes a stable polypeptide in a 
randomly fragmented population of DNA, said method comprising: 

expressing members of said randomly fragmented population of DNA as individual flision 
expression products in a multiplicity of said host cells grown under selective conditions, 
wherein the coding sequence for each fusion expression product comprises a member of said 
randomly fragmented population of DNA operably linked to a coding sequence for a selector 
protein, expression of which confers resistance to said selective conditions on said host cells; 

screening for host cells that express a fusion expression product under selective conditions 
as indicative of host cells containing a DNA sequence that encodes a stable polypeptide; and 

identifying said DNA sequence. 

34. The method according to Claim 33, wherein said DNA is cDNA. 

35. A method for obtaining peptides that improve at least one of the solubility or functional 
propenies of a free defective protein, said method comprising; 

coexpressing in a multiplicity of host cells grown under selective conditions (a) a defective 
fusion protein comprising said defective protein and a selector protein, expression of which 
confers resistance to said selective conditions on said host cells and (b) a member of a tethered 
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random peptide library comprising a linear chain of about 3 to 20 amino acids fused via a 

flexible linker to a stable carrier; 

isolating host cells that express said defective fusion under selective conditions that are not 

permissive tor host cells expressing (a) alone as indicative of host cells expressing a peptide 
5 that improves said defective fusion protein whereby host cells containing peptides that improve 

said defective fusion protein are obtained; 

identifying said peptides that improve said defective fusion protein; and 
screening said peptides that improve said defective fusion protein for those peptides that 
improve at least one of the solubility and functional properties of said free defective 
10 protein. 

36. The method according to Claim 35, wherein said random peptide library is a linear chain of 
12 randomly encoded amino acids fused via a flexible linker to the N-terminus off". coU 
thioredoxin. 

37. The method according to Claim 35» wherein said free defective protein is a secreted protein 
15 and said selector protein is a secreted protein. 

38. A method for obtaining peptides that improve the solubility of a human amyloid P peptide, 
said method comprising: 

coexpressing in a multiplicity of host cells grown under selective conditions (a) a defective 
fusion protein comprising said human amyloid P peptide and a secreted selector protein, 
20 expression of which confers resistance to said selective conditions on said host cells and (b) a 
member of a tethered random peptide library comprising a linear chain of about 3 to 20 amino 
acids fused via a flexible linker to a stable carrier; 

isolating host cells that express said defective fusion protein under selective conditions that 
are not permissive for host cells expressing (a) alone as indicative of host ceils expressing a 
25 peptide that improves said defective fusion protein whereby host cells containing peptides that 
improve said defective fusion protein are obtained; 

identifying said peptides that improve said defective fusion protein; and 
screening said peptides that improve said defective fusion protein for those peptides that 
improve the solubility said human amyloid P peptide. 
30 39. The method according to Claim 38, wherein said linear chain comprises about 3 to 12 
amino acids. 
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40. A complex comprising a peptide identified according to the method of Claim 38 and a 
human amyloid (} peptide. 

41. A pharmaceutical composition comprising a peptide identified according to the method of 
Claim 38. 

42. A peptidomimetic of a peptide identified according to the method of Claim 38. 

43. A composition comprising the peptidomimetic according to Claim 42. 
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Box I Observations where certain claims were found unsearchable (Continuation of item 1 of first sheet) 

This International Search Report has not been established in respect of certain daims under Article 17(2)(a) tor the following reasons: 
1. Q Claims Nos.: 

because they relate to subject matter not required to be searched by this Authority, namety: 



2. H Claims Nos.: ^0-43 
because they relate to parts of the 
an extent that no meaningful Intern 

see FURTHER INFORMATION sheet PCT/ISA/210 



because they relate to parts of the International Application that do not comply with the prescribed requirements to such 
an extent that no meaningful International Search can tw carried out. specificaUy: 



3. Claims Nos.: 

because they are dependent claims and are not drafted in accordance with the second and third sentences o( Rule 6.4(a). 

Box II Observations where unity of invention is tacking (Continuation of item 2 of first sheet) 

This lnterT>ationaJ Searching Authority fouryj multiple inventions in this intenwtional application, as follows: 

see additional sheet 



1 • I I As all required additional search lees were timely paid by the applicant, this Interrational Search Report covers al 
' — ' searchable claims. 



As all searchable claims could be searched without effort justifying an additionai fee, this Authority did not irwite oavment 
of any additional fee. 



^' nn ^ ^'"^ °' required additional search fees were timely paid by the applicant, this International Search Report 
covers only those daims for which fees were paid, specifically claims Nos.: 

1-20,33-43 



No required additional search fees were timely paid by the applicant. Consequently, this Intemationai Search Report is 
restncted to the irtvention first mentioned in the claims; it is covered tjy daims Nos.: 



Remark on Protest The additional search fees were accompanied by the appBcanfs protest. 

fx] '^pnjtest accompanied the payment of additional search fees. 
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Continuation of Box 1.2 
Claims Nos.: 40-43 



Claims 40-43 refer to a peptide and peptidomimetic that improve the 
solubility of beta-amyloid without giving a true technical 
characterization. Moreover, no such peptides are defined in the 
application. In consequence, the scope of said claims is ambiguous and 
vague, and their subject-matter is not sufficiently disclosed and 
supported (Art. 5 and 6 PCT). No search can be carried out for such ourelv 
speculative claims whose wording is, in fact, a mere recitation of the 
results to be achieved. 

The applicant's attention is drawn to the fact that claims, or parts of 
claims, relating to inventions in respect of which no international 
search report has been established need not be the subject of an 
international preliminary examination (Rule 66.1(e) PCT). The aDDlicant 
IS advised that the EPO policy when acting as an International 
Preliminary Examining Authority is normally not to carry out a 
preliminary examination on matter which has not been searched This is 
the case irrespective of whether or not the claims are amended following 
receipt of the search report or during any Chapter II procedure 



Intemationai Application No. PCT/US QO 708477 



FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



This Intemationai Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: 1-20 

A method for obtaining host cells expressing a mutant of a 
desired protein optimized for exression in said cells, said 
method comprising: Expressing a library of mutagenized 
coding sequences for said protein as fusion proteins to a 
coding sequence for a selector protein and identifying cells 
under non-permissive conditions for host cells expressing a 
fusion protein with an unmutageni zed desired protein. 
Said method where the selector is an antibiotic resistance 
marker and the desired protein is green fluorescent protein 

2. Claims: 21-32 

A nucleic acid comprising a coding region for a green 
fluorescent protein with a mutated sequence at codon 105 
preferentially coding for a serine instead of an asparagine. 
A cell comprising the above mentioned nucleic acid. A green 
fluorescent protein comprising a serine instead of an 
asparagine at residue 105. 

The above mentioned nucleic acid additionally comprising a 
mutated codon 164. A cell comprising said nucleic acid and a 
green fluorescent protein con^rising the additional mutation 
at codon 164. 



3. Claims: 33,34 

A method for identifying a DNA sequence which encodes a 
stable polypeptide in a randanly fragmented population of 
DNA, preferably cDNA, said method comprising: 
Expressing a library of randomly fragmented sequences as 
fusion proteins to a coding sequence for a selector protein 
which can confer resistance to selective conditions to a 
cell, and identifying cells under selective conditions 
indicative for host cells expressing a fusion protein 
comprising a stable polypeptide. 



4. Claims: 35-43 



A method for obtaining peptides that improve at least one of 
the solubility or functional properties of a free defective 
protein, said method comprising: 

-coexpressing a) a fusion protein conprising said defective 
protein and a selector protein and b) a member of a tethered 
random peptide library fused via a flexible linker to a 
stable carrier; 

-isolating host cells expressing under conditions that are 
not permissive for host cells expressing (a) alone, 
-identifying peptides that improve said defective fusion 
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protein and screening said peptides for those peptides that 
improve at least one of the solubility and functional 
properties of the defective protein. 

Said method where the defective protein is the human amyloid 
beta peptide. 

A COTiplex comprising a peptide identified according to the 
above mentioned method and a human amyloid beta peptide. A 
peptidomimetic of said peptide. A phannaceutical composition 
conprising said peptide or said peptidomimetic. 
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